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ABSTRACT 

The  asymptotic  behavior  of  the  systems  =  X^  +  anb(Xn,£n) 

+  an°C^n)^n  and  dy  =  b(y)dt  +  *^a(t)  o(y)dw  is  studied,  where 

}  is  i.i.d.  Gaussian,  {£n}  is  a  (correlated)  bounded  sequence  of 
random  variables  and  a^  w  A^/logCAj+n) .  Without  },  such  algorithms 
are  versions  of  the  'simulated  annealing'  method  for  global  optimization. 
When  the  objective  function  values  can  only  be  sampled  via  Monte  Carlo, 
the  discrete  algorithm  is  a  combination  of  stochastic  approximation  and 
simulated  annealing.  Our  forms  are  appropriate.  The  ( )  are  the 
'annealing'  variables,  and  (5^)  is  the  sampling  noise.  For  large  A^, 
a  full  asymptotic  analysis  is  presented,  via  the  theory  of  large  deviations 
Mean  escape  time  (after  arbitrary  time  n)  from  neighborhoods  of  stable 
sets  of  the  algorithm,  mean  transition  times  (after  arbitrary  time  n) 
from  a  neighborhood  of  one  stable  set  to  another,  approximate  asymp¬ 
totic  invariant  measures,  and  location  of  the  values  of  {X  }  or  y(-) 
the  case  where  Eb(x,C)  =  b(x)  is  the  (negative)  of  a  gradient  of  a 
function  B(x)  ,  and  application  to  global  function  minimization  via  Monte 
Carlo  methods . 

Key  words.  Monte  Carlo,  stochastic  approximation,  large  deviations, 

simulated  annealing,  global  function  optimization  from  noisy 
samples 


1 .  Introduction 

'  [ft  A  -  **  vt  J 

itU- 

We  study-  the  asymptotic  behavior  of  the*  system 


^  Y 


(l.D 


X  .  =  X  +  a_b(X  ,£  )  +  a  c(X  )ip  ,  X  €  R  , 
n+1  n  n  ^  n  tt  n  v  n'Yn 


where  {£n)  is  a  sequence  of  bounded  random  variables,  {i|/  }  is  a  ^  i 
sequence  of  zero  mean  independent  and  identically  distributed  (i. i.d.) 
Gaussian  random  variables,  the  two  sequences  are  mutually  independent  and 
=  Ag/logCn+A^) ,  AQ  >  0,  A1  >  1.  The  o(*)  and  b(-,£)  are  Lipschitz 
continuous,  uniformly  in  £»  and  a(*)  is  bounded. 


&yv,  j-qJ--  - 

c 


Such  stochastic  approximation  algorithms  are  a  Monte-Carlo  version 


of  the  currently  popular  'annealing*  method  for  locating  the  global 

minimum  of  a  function  with  many  minima  [1] - [3] :  For  example,  let  1f(.) 

denote  a  continuously  differentiable  function  and  set  Eb(x,£n)  =  b(x)  = 

-¥x(x) .  Suppose  that  noise  corrupted  samples  of  b(x),  namely  b(x,£), 

are  available  from  an  experiment  or  a  simulation  on  a  system  whose  'mean' 

performance  is  B(x)  at  parameter  value  x.  Then  the  algorithm  Yn+j  = 

Y  +  a  b(Y  ,£  )  is  a  standard  form  of  a  stochastic  approximation  method 
n  n  n  n 

for  locating  a  local  zero  of  b(-)  or  local  minimum  of  B(>)  under 
appropriate  conditions  on  (a^).  The  °(x)>P  term  might  be  added  arti¬ 
ficially,  following  the  usual  logic  of  the  'annealing'  scheme,  in  order 
to  force  the  sequence  to  jump  around  until  it  eventually  'settles'  near 
a  global  minimum  of  B(-).  When  only  random  samples  b(x,?)  are  avail¬ 
able,  the  situation  is  much  more  complex  than  in  the  non-random  sampling 
case.  It  is  important  to  allow  the  {£n)  to  be  correlated,  since  (a)  many 
efficient  Monte  Carlo  method.''  (e.g.,  antithetic  variables)  require  cor¬ 
related  noise,  or  (b)  simulations  are  often  run  on  a  continuously  operating 
system,  where  the  noise  is  inherently  correlated. 


The  theory  of  large  deviations  [4],  [5],  [7],  provides  the  appropri¬ 
ate  methods,  and  allows  us  to  obtain  a  fairly  complete  characterization 
of  the  asymptotic  location  of  and  behavior  of  the  {X^}. 

If  the  rate  of  decrease  of  a^  is  'faster*  than  O(l/log  n) ,  then 
(under  broad  condition)  {X^}  will  converge  w.p.l,  and  not  continue  to 
'try'  to  escape  from  the  stable  set  of  the  algorithm  in  which  it  is 
trapped.  In  general,  if  o(x)  ^  I  or  F(x)  *  -F  (x)  ^  b(x,£)  for  all 
£,  the  algorithm  does  not  necessarily  (asymptotically)  spend  most  time 
near  a  global  minimum  of  F(.),  but  the  theory  tells  us  just  where  it 
does  spend  most  of  its  time.  There  are  practical  alterations  of  (1.1) 
for  which  (X^}  would  eventually  spend  most  time  near  a  global  minimum, 
and  these  are  discussed  in  Section  5.  One  form  of  the  alteration  require 
a  specific  'cyclic'  correlation  among  the  {£^}.  Very  similar  results 
hold  for  the  diffusion 


(1.2) 


dy  =  b(y)dt  +  -  o(y)dw. 

/iogtt+Aj) 


Section  2  defines  a  number  of  terms  and  results  from  the  theory  of 
large  deviations  which  are  useful  for  the  further  formulation  of  our  prob 
lem.  In  Section  3,  we  treat  the  mean  (asymptotic)  escape  time  of  {Xn) 
from  a  set  containing  a  stable  set  for  the  algorithm  (1.1).  Such  quanti¬ 
ties  are  important  in  the  study  of  any  such  algorithm,  since  they  provide 
useful  information  on  the  stability  of  the  algorithm,  convergences,  etc. 
In  Section  4,  we  treat  a  global  problem,  where  there  are  (possibly)  many 
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stable  and  unstable  sets  for  (1.1)  (i.e.,  for  the  ODE  x  =  Eb(x,£)).  The 
asymptotic  formulas  for  the  mean  transition  time  between  these  sets  is 
obtained,  as  is  the  (conditional)  transition  distribution  of  a  certain 
chain  associated  with  the  asymptotic  behavior  -  whose  properties  yield 
the  mean  transition  and  sojourn  times,  'near'  invariant  measures,  etc. 

Section  5  contains  extensions:  the  form  of  the  result  for  Ito  equation 
models,  the  case  where  -b(x)  is  obtained  from  a  gradient  of  a  potential 
function,  and  applications  to  the  problem  of  global  minimization  of  a  func¬ 
tion  via  Monte  Carlo  methods,  and  estimates  of  the  asymptotic  measures  for 
(Xnl  and  y(*) • 

In  the  more  standard  works  in  'simulated  annealing'  [1]  to  [3]  the 
objective  function  values  are  known  exactly  -  and  the  algorithm  can  'move' 
by  large  steps.  Here  and  in  [8]  the  parameter  set  is  not  discrete  and  the 
algorithm  moves  only  in  small  steps.  With  this  constraint,  the  various 
algorithms  are  all  quite  similar  -  in  that  the  transition  probabilities 
are  close.  Often  moving  by  small  steps  makes  sense  -  particularly  when  the 
parameter  set  is  not  discrete.  Of  course,  we  allow  sampling  noise  and/or 
imprecise  measurements.  The  results  in  [8]  are  special  cases  of  the  results 
here . 

Numerous  variations  are  possible  -  with  rather  similar  results,  although 

the  proofs  might  involve  somewhat  more  details.  E.g.,  the  o(X  )  can  be 

periodic.  In  the  sampling  noise  case,  this  might  make  sense.  The  direction 

of  iteration  of  (1.1)  can  be  chosen  at  random  -  or  several  (a  random  number) 

steps  can  be  taken  in  each  direction.  Approximations  to  Gaussian  noise 

{•;  }  can  be  used  -  with  results  close  to  those  obtained  here, 
n 
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In  order  to  simplify  part  of  the  notation  later  on,  we  actually  work 
with  slightly  altered  {ip  }.  We  truncate  ip  so  that  a  |i|>  I  •+  0  w.p.l 
but  the  truncation  level  goes  to  ®  as  n  -*•  ®.  This  can  be  done  since 
for  each  6  >  0 


P{a  |i|)  I  >  6  >  0}  <  exp-6^/2o^a^  =  y 
n|rn'  —  —  r  n  'n 

2 

for  some  0  >  0,  where  ly  <  ®.  In  calculating  the  action  functionals 

n 

below  we  can  use  either  or  their  truncated  version,  and  then  take 

n 

limits.  The  result  is  the  same.  The  procedure  is  unrestrictive,  since 

we  are  interested  in  the  'tails'  of  the  {X^}  and  related  processes.  We 

continue  to  use  the  i|»  notation  -  but  also  assume  that  (eventually) 

a  I  til  is  as  small  as  desired, 
n 1  n ' 

For  the  case  of  independent  and  identically  distributed  (£n),  define 
the  H,  Hq  and  L  functionals  in  the  usual  way  in  the  theory  of  large 
deviations:  Let  b(x)  =  Eb(x,£n)  and  define 


(2.1) 


HQ(a,x)  =  log  E  exp  ot'[b(x,£)  -  b(x)  +  a(x)t/j] 

=  log  E  exp  a'[b(x,£)  -  F(x)] +  a'a(x)Za' (x)a/2. 


where  I  =  cov  ip, 

H(a,x)  =  a'b(x)  +  HQ(a,x) 

L(B,x)  =  sup[a'B  -  H(a,x)]  =  sup[a'(6  -  b(x))  -  H0(a,x)]. 
a  a 

It  is  often  convenient  to  treat  L(’,’)  as  a  function  of  8  -  F(x)  and 
x . 
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Generally,  we  assume  that  there  is  a  continuous  function  H^(a,x), 
differentiable  in  a,  and  a  Lipschitz  continuous  F(.),  such  that  for  any 
bounded  stopping  time  v  and  associated  o-algebra  By  = 


(2.2) 


v+N 

b(x)  =  lim  Eg  l  b(x,£.) 

N  v  v+1 

1  v+N-1 

H  (a,x)  =  lim  tt  log  E  exp  a'  l  [b(x,£n)  -  b(x)], 
"  N  v  n=v 


where  the  convergence  is  uniform  in  a)  and  v  and  in  (x,a)  in  any 
compact  set;  for  example,  any  finite  state  ergodic  Markov  chain 
will  do,  as  will  the  'cyclic'  noise  of  Section  5,  or  any  sufficiently 
strongly  (and  stationary)  mixing  process.  We  now  define 


H  (a,x)  =  a'a(x)Ea' (x)a/2  +  H  (a,x) 
and  1 

H(a,x)  =  a'b(x)  +  H0(a,x). 

For  each  T  <  00 ,  define  the  action  functional  Sx(T,4>)  as  equal 
to  00  for  $  not  absolutely  continuous,  and  otherwise 

fT  • 

S  (T,4>)  =  L(*(s),<Ks))ds,  <K0)  =  x. 

X  Jo 

Let  U(x)  denote  the  set  (6:  L(B,x)  <  »} ,  with  closure  U(x) . 
U(-)  is  convex  and  upper  semicontinuous  (in  the  Hausdorff  topology)  in 

that  lim  U(x  )  c  U(x) .  In  the  i.i.d.  case 

x  +x  n 
n 

U(x)  =  b (x)  +  co[b(x,C)  -  b(x)  +  a(x)^]  =  F(x)  +  U0(x), 
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where  co  is  the  closed  convex  hull  over  the  (a.s.)  range  of  £,  • 
Information  on  U(-)  in  the  general  case  is  in  [6,  Section  3]  and  in 
[4,5].  The  Uq(x)  will  be  a  set  of  'control  values'  for  the  differential 
equation 

(2.3)  x  =  b (x)  +  u,  u(t)  €  UQ(x(t)). 


Uq(x)  and  (2.3)  give  approximations  to  the  possible  noise  determined 
paths  of  the  continuous  parameter  interpolation  of  the  {X^},  with  in¬ 
terpolation  intervals  {a^}.  he  always  assume  that  U(-)  is  continu¬ 
ous  in  the  Hausdorff  topology. 

In  the  i.i.d  case,  if  there  is  a  E  >  0  such  that 

cov[b(x,£)  -  b(x)  +  o(x)\|;]  =  E  >  0,  we  say  the  system  is  non-degenerate . 
If  E(x)  is  singular  for  some  x,  then  the  case  is  said  to  be  degenerate 
In  general,  let 


1 

77  COV 
N 


N 

I 


1 


[b(x,q) 


b(x)  +  a(x)4u] 


N 

IN(x)  -*•  E(x) 


Z11M 

E12 

I21 (x) 

Z22 

The  system  is  said  to  be  non-degenerate  if  E(x)  ^  E  >  0.  Consider 
the  special  form  of  the  degenerate  problem  where  E^(x)  =  E12(x)  = 

E 2i (x)  =  0,  E22(x)  >_  E  22  >  0.  Then  we  can  write  (use  x  =  (x  ,x7), 
a  =  (otj  ,a2) ,  etc.) 


X.  1 

1  ,n+l 

X_  i 

2  ,n+l 


T' 


w 


w 

$ 

y- 

i 


I---. 


Define 


1  N 

H^Ca.x)  =  -  log  E  exp  £  a'[b(x,q)  -  b(x)  +  cr(x)i|>.]. 


and  let  H  and  R,  denote  the  gradient  and  Hessian  matrices  (with 

respect  to  a),  resp.  Then  HN(a,x)  >  0,  ^  a(0,x)  =  0  and  HNjaa(0,x) 

IN(x) .  Let  K  denote  an  arbitrary  compact  set.  Since  H^(x,a)  -*■ 

H(x,oO  >_  0  and  H,(*,x)  is  convex,  in  the  non-degenerate  case  H(*,x) 

is  strictly  convex  in  some  neighborhood  of  a  =  0  which  does  not  depend 

on  x,  for  x  €  K.  This  implies  that  there  is  a  6^  >  0  such  that  for 

0  in  the  <5^ -neighborhood  (b(x))  and  x  €  K,  L(0,x)  is  uniformly 

bounded  also  L(b(x),x)  =  0,  Lg(b(x),x)  =  0  and  L(-,x)  is  strictly 

convex  in  (b(x)).  Thus  L(8,x)  =  o(|B-b(x)|).  For  the  above  special 

form  of  the  degenerate  case,  L(6,x)  *  ®  unless  6^  =  bjCx).  There  is  a 

6i  >  0  (6^  not  depending  on  x  in  K)  such  that  for  x  £  K,  02  € 

N.  (b„(x))  and  8.  =  b. (x)  ,  L(8,x)  is  uniformly  bounded,  also, 

0^  l  11 

lQ  (b(x),x)  =  0  and  L(b.(x),»,x)  is  strictly  convex  for  g,  € 
p  2  l  * 

N6  and  UbjCx)  ,e2,x)  =  o(|e2-b2(x)  |) . 

Define  (X£)  and  x£(-)  by 

X£  ,  =  Xe  +  eb(XE,£  )  +  ea(X£)^ 
n+1  n  v  n  nJ  v  n  rn 

(2.S) 

G  C 

x  (t)  =  X^  on  [ne,ne+e) 

p 

A  piecewise  linear  interpolation  could  also  be  used  to  define  x  Cy¬ 
linder  our  conditions,  mild  alterations  of  the  arguments  in  [4]  can  be 
used  to  show  that  Sx(T,<fO  is  an  action  functional  for  {x£(-)}  in 
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the  sense  that:  for  any  T  and  any  Borel  set  A  (with  interior  A® 
and  closure  A)  in  Cx[0,T],  the  space  jf  R  -valued  continuous  functions 
on  [0,T]  with  initial  value  x. 


(2.6) 


-infn  S  (T,40  <  lim  £  log  P  {xe(0£A} 

<J>€AU  x  e  x 

<_  lim  e  log  P  {xe(-)  A)  <  -inf  S 
e  x  <t>€A  x 


For  each  fixed  T,  let  <t>x(T)  =  (4> :  Sx(T,(J>)  <_  s},  a  compact  set  [4]. 

Then  for  any  6  >  0  and  d  >  0,  there  is  an  >  0  such  that  for 
e££0  W  (d(*,*)  denotes  the  appropriate  distance) 

(2.7)  Px{d(xe(-),$g(T))  >  5}  <  exp-(s-d)/c. 

The  uniformity  of  convergence  (conditional  on  x,B^)  in  the  defini¬ 
tion  (2.2)  of  H(a,x)  has  some  consequences  which  will  be  quite 
important  in  the  sequel.  First,  it  is  convenient  to  introduce  some  ad¬ 
ditional  terminology.  Let  T  denote  a  stopping  time  with  respect  to 
the  family  of  o-algebras  {B^,^,  i  <  t/e) }  =  (B^.  (t )  > »  and  let  B£(t) 

denote  the  associated  'stopped'  o-algebra.  Let  P^  g  ^  denote  the 

£  ^  £ 
probability  measure  for  x  (•)»  conditioned  on  B^Ct)  and  on  x  (t)  =  x. 

When  using  this  terminology,  it  is  only  necessary  that  the  x  be  in  the 

a.s,  range  of  x£(t),  and  we  always  assume  this.  Equivalently,  we  can 

assume  that  P  R  (  .  is  the  conditional  distribution  for  the  process 
x* 

which  is  reset  to  value  x  at  time  T  and  then  continues  to  evolve  as 
before.  Then  (2.6)  and  (2.7)  can  be  replaced  by  (2.6'),  (2.7'),  where 
the  convergence  in  (2.6')  and  the  bound  in  (2.7')  are  uniform  in  T ,  u 
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Iffi 


3.  The  Escape  Time  Problem 


Let  Kg  denote  a  compact  stable  invariant  set  for 


(3.1) 


x  =  b(x) , 


and  G  a  bounded  open  set  containing  Kg,  with  a  piecewise  differentiable 
boundary  3G  and  with  G  in  the  domain  of  attraction  of  i.e.,  all 
trajectories  starting  in  G  converge  to  Kg.  For  each  n  >  0,  define 
{x£}  and  xn(-)  by  Xg  =  x  and 

xm  •  xk  *  Vk^-w  *  Vk°<xX.k 


(3.2) 


<n(t)  =  Xj  for  t  £  [  j  an+i,  l  Vi)  =  ‘'X.l5 


(3.2)  is  the  actual  process  with  which  we  work  in  order  to  study  the  tail 
of  the  (xn)-  Set  =  min{t:  xn(t)  t  G).  In  this  section,  we  will 
compute  the  asymptotics  of  {t11}  for  x  €  G  under  the  following  'con¬ 
trollability'  condition: 

A3. 1 .  For  each  6  >  0  there  is  a  p-neighborhood  N^(Kg)  —  K0 

and  >  0,  T^  <  «>,  such  that  for  each  x,y  €  Np(Kg),  there  is  a  path 

<K-):  $(0)  =  x,  <f>(T  )  =  y  where  T  <  T  and  S  (T  , <J>)  <6. 

y  y  p  *  p 

The  condition  is  not  very  restrictive,  and  holds  in  'typical'  cases. 

It  is  a  natural  generalization  of  the  usual  non-degeneracv  assump¬ 
tion  which  is  used  in  large  deviations  work  with  diffusion  processes  - 
where  the  analog  of  (A3.1)  always  holds.  For  example,  let  b(x)  =  0  in 
Kn  and  let  the  problem  be  non-degenerate  (e.g.,  let  p(x)Ic'(x)  >  0 
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on  Kq) .  Then  (A3.1)  follows  from  L(8»x)  =  o(|B-F(x)|)  (see 
above  2.5).)  Alternatively,  assume  non-degeneracy.  For  each  y  >  0 
there  is  a  <  »  such  that  the  y-neighborhood  of  the  path  of  (3.1) 
on  [0,T^],  which  starts  at  any  x  €  Kq,  covers  Kq.  (A3.1)  follows  from 
these  facts  and  the  fact  that  S(T,$)  =  0  for  functions  $(•)  satis¬ 
fying  (3.1).  In  typical  applications  to  global  minimization  by  Monte 
Carlo,  (A3.1)  and  (3.3)  below  holds,  since  cov  tpn  >  0  and  o(x)  =  identity 
matrix. 

Define 

S(x,y)  =  inf  {S (t,<(>) :  <K0)  =  x,<KT) 

<P,T 

and,  for  x  €  G, 

Sr(x)  =  inf  S(x,y)  «  inf{S(T,$):  <{>(0) 
y€3G  <j>,T 

Sg(B)  =  inf  SG(x). 
x€B 

By  (A3.1)  and  the  fact  that  S(T,$)  =  0  if  <{>(•)  satisfies  (3.1), 

SG(x)  is  constant  on  Kq  and  Sq(x)  <_  Sq(Kq)  for  x  €  G.  The 
S(-,-)  and  SG(*)  are  lower  semicontinuous  functions  (this  result  does 
not  require  (A3.1))  [4],  Also 

S(x,y)  =  lim  lim  lim  z  log  P  (t,  <_  T}, 

T-*»  6+0  e-*-0  X  0 

where  =  inf{t:  xe(t)  £  N^ty)}. 

In  Theorem  1,  we  will  have  use  for  the  following  auxiliary  result. 


*  y) 


=  X,4»(T)  £  9G} 


* 


luvnji 


WWWIffl'HIWlIWUU  BWWWWWBWWWWWIW  ww 


HWTW 
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Lemma  3.1.  Under  (A3.1)  and  the  other  conditions  assumed  above,  for 
each  y  €  G,  S(x,y)  (resp. ,  S^(x))  is  continuous  at  each  x  €  KQ. 

Outline  of  proof:  The  proof  uses  the  controllability  condition  to 
construct  'nearly'  optimal  paths.  Let  S(x,y)  <  °°.  Otherwise,  a  similar 
proof  yields  S(xn,y)  -*■<*>  as  xn  -*■  x  €  KQ.  By  the  controllability  (A3.1), 
for  x,x'  €  Kq,  and  any  sequence  xn  -*■  x,  S(x,x')  =  0,  S(xn>x)  0  and 
S(x,xn)  ■+•  0.  Then  the  lemma  follows  from 

S(xn,y)  <  S(xn,x)  +  S(x,y) 

S(x,y)  <  S(x,xn)  ♦  S(xn.y), 

-S(x,xn)  iSCxn,y)  -  S(x.y)  iS(xn.x).  Q.E.D. 

Let  Gp  denote  a  p-neighborhood  of  G. 

Theorem  1.  Assume  (A3.1)  and  the  other  conditions  above,  and  suppose 
that  as  p  +  0 

(3.3)  SG  (x)  4  SG(x) ,  some  x  £  KQ. 

Up 

Then,  for  large  enough  A^, 

(3.4)  lim  an  log  E^n  =  SG(K0) ,  x  £  G. 

n 

Remark .  The  continuity  condition  (3.3)  is  not  very  restrictive. 

If  it  doesn't  hold  for  G,  it  will  hold  for  a  small  perturbation  of  G. 

It  always  holds  if  o(x)EWa(x)  is  positive  definite  on  3G.  Other 
conditions  guaranteeing  it,  based  on  the  'controllability'  assumption 
(A3.1),  are  in  [6,  Section  4],  It  also  holds  for  the  particular  replace- 
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ment  for  G  used  in  Section  4  (Rr  -  U  N  (K.)).  The  proof  of  Theorem  1 

i€J  yl  1 

is  an  adaptation  of  that  of  Theorem  4.1  of  [7]. 

Proof.  Part  1.  The  asymptotic  properties  and  result  (3.4)  will  not 

change  if  we  redefine  a  as  a.  =  a  =  A.,  a  =  An/log  n,  for  n  >  1. 

n  u  l  u  n  u 

Thus,  after  time  1,  we  set  A^  =  0.  In  order  to  obtain  the  lower  and 

upper  bounds  on  the  escape  times  which  will  give  (3.4),  we  approximate 

the  (an)  by  a  piecewise  constant  process.  We  will  actually  work  only 

with  this  approximation,  but  the  general  result  will  follow  readily  from 

it.  Fix  a  >  1  but  close  to  1.  We  divide  the  'discrete'  time  into 

sections  {[l,n^),  [n^n^),...)  such  that  the  ratio  of  the  value 

at  the  start  of  a  section  (a  )  to  that  at  the  end  (a  )  is  roughly 

nk  nk+l 

a.  Thus  (if  n^  is  not  an  integer,  use  any  'nearest'  integer) 

(3.5)  nk+1  =  n£,  n2  >  1. 

For  n  €  [nk,nk+1),  k  >  1,  define 

k  . 

bn  =  Ag/log  nj  =  ek  =  (A0/log  n^/o  . 

We  call  fnk,nk+i^  the  kth  section.  In  the  (continuous  parameter) 

interpolated  time  scale,  this  section  has  length 

„k  _ 
a  a  ,  a 

/  Ao  \  nl  ^ni"^  k 

(3.G)  ek[nk+1-i^]  =  ( — -jj -  =  A0A3exp[a  log  nrk  log  a], 

A3  =  (1/log  np  (n°  -  1) . 

k 

This  interval  is  larger  than  A^exp  c^a  (for  some  c^  >  0)  if  k 
is  large  enough.  The  'interpolated  interval'  ek ^nk+l"nk^  *s  callec' 
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the  interpolated  k- section. 

We  now  define  the  analog  of  (3.2)  with  piecewise  constant  coeffici- 
— k  — k  — k 

ents.  Define  {X  }  and  x  (•)  as  follows.  X.  =  x  and  for  each  k 
n  0 


(3.7) 


X*  .  =  X*  +  ak[b(Xk,C  )  ♦  ], 

n+1  n  nL  ^  n’snk+n'  n  nk+n 


k  k 

where  a  =  e,  for  the  first  (n,  , -n, )  terms,  a  =  e,  ,  for  the  next 
n  k  v  k+1  k'  *  n  k+1 

(n^+2~n^+p  terms,  etc.  To  define  the  piecewise  constant  continuous 

parameter  interpolation  x  (•)  of  {Xn},  we  use  interpolation  intervals 

(a  }.  Let  B,  (t)  denote  the  minimal  o-algebra  measuring  all  the 
n  k 

data  starting  from  time  zero  up  to  that  used  to  calculate 

At). 

_1,  r> 

Define  =  min(t:  x  (•)  g  G) .  Let  T  denote  the  escape  time  from 
G  for  the  process  xe(-)  introduced  in  (2.5).  Then 


(3.8a) 


lim  e  log  Extc-  =  Sg(K0),  x  e  G. 


((3.8a)  will  be  obtained  as  a  by-product  of  the  development  below.)  Ke 


will  show  that 


(3.8b)  lim  eR  log  =  SG(KQ) ,  x  £  G. 

The  theorem  readily  follows  from  this  and  the  arbitrariness  of  a.  The 
key  to  the  development  is  the  fact  that  replacing  e  by  in  (3.8a) 

and  taking  limits  as  k  -*■  »  yields  (in  the  sense  of  logarithmic  asymp¬ 
totic  equivalence) 

E,  , 

Ext  -  exp(SG(K0)aK/A0) , 


S' V  V  V  VN 


and  for  large  An  the  ratio  of  the  quantity  in  (3.6)  to  this  expression. 


namely 


(3.9) 


(exp  cnaK)An 

N 1  =  _ ~ _ y-  - 

k  exp(SG(KQ)ak/A0) 


goes  to  infinity  very  fast  as  k  -*•  ». 


Part  2.  Assume  S^fK^)  <  ».  Otherwise  a  similar  proof  yields  the 
result.  Fix  d  >  0.  Choose  0  <  <  y2  <  ?2,  Tj,  (T2  >  Tj)  , 

h  >  0  and  6  >  0  such  that  the  following  hold:  (a)  for  any  initial  con¬ 
dition  x  €  G  -  N  (K_) ,  the  path  of  (3.1)  gets  into  N  (K  )  by  time 
^1  u  u 
T^  and  never  leaves  (K^)  after  that;  (b)  for  each  point  x  6  I  (KQ) 

there  is  a  path  <px( •)  such  that  4>X(0 )  =  x,  4>x(t)  f!  Gh  »  N,  (G)  (the 
h-neighborhood  of  G)  for  some  t  <  and 

sx(t3,$X)  <  Sg(K0)  +  d/4; 


(c)  there  is  a  path  with  cost  <_  d/8  connecting  any  x,y  €  (Kq)  in 
time  i  t2  -  T3- 

The  requirement  (c)  can  be  satisfied  owing  to  (A3.1).  The  require¬ 
ment  (b)  can  be  satisfied  owing  to  the  'continuity'  (3.3),  and  (a)  can 
be  satisfied  since  Kq  is  the  only  limit  set  in  G  (or  in  G^,  for 
small  h)  for  (3.1) . 

Part  3.  Let  denote  the  number  of  intervals  of  length  T  in 

the  interpolated  k-section .  By  (3.6),  there  is  a  >  0  such  that 
exp  CjOt  for  large  enough  A^.  We  will  next  evaluate  (3.10),  ar. 
upper  bound  for  E  x,  . 

X  K 
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(3.10) 


EXTk  i  T  l  >  "T> 


x  k 


Let  Pg  denote  the  probability  measure  for  x  (•)»  conditioned  on 

k 

B^(t).  Until  after  (3.14)  let  nT  <  e](;(njc+2'njc)  •  Note  that 


(3.11) 


P*(V"T)  -  M1  -  \(nT-T)  V^i^^nT-Tr 


We  have  (A  and  B  are  defined  by  the  two  events  described  in  the  middle 
term  of  (3.12))  for  T  = 


PBk(nT-T){V!‘nT'T;)  -  T1  +  T2^I{Tk>nT-T) 


-  PBk(nT-T) 


(x  ( • )  goes  to  N  (Kn)  and  then  stavs  in  N  (K.) 

Uj  U  V2  U 


(3.12) 


all  on  [nT-T,nT-T+Tj] ,  then  leaves  G  on  [nT-T+Tj ,nT] } 
starting  in 

PBk(nT-T)(A  n  B,I{tk>nT-T)' 


We  have 


— k  itif  fnT-Tl  —  2 

oj:x  (nT-T)€G  BkC  T  T)  2 

for  all  for  large  k.  This  follows  from  (2. 7’)  together  with  the 

fact  that  Sy(T,40  =0  if  and  only  if  $(•)  satisfies  (3.1),  and  the 

fact  that  all  trajectories  of  (3.1)  starting  in  G  stay  in  N  (Kn) 

M  2  ^ 

after  time  T. .  Then,  for  y  £  N  (Kn)  and  large  k,  (2.6')  yields 
1  U2  u 
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:>Bk(nT-T+T1){B}I{xk>nT-T} 


y ,o)  :x^ (nT-T+Tj)=y€N^  (KQ)  BkCnT-T+Ti) 


(3.13) 


•{  sup  |({iy(t)-xk(nT-T+T  +t)  |  <  h/4}l,  T 
0<t<T3  1  tTk  ni_u 


>  [exp  -  [Sg(K0)  +  d/2]/ek]l{^>nT_T} 


Combining  the  above  estimates  yields  that  (3.11)  is  bounded  above 
by 

(3.14,  Px'VnT!  i  V  -  1  «P-(Sg(K„)  -  «/2)/*kU{Tk>nT.T). 
Define 

Nk 

\  =  [1  -  exp  -  (Sg(Kq)  +  d/2)/ek]  K. 

Now  iterate  (3.14)  up  to  n:  nT  =  ekCnk+1*nk) ,  then  use  ek+1  for  the 
next  ^k+2>  etc-  Doing  this  and  substituting  into  (3.10)  yields  that 
for  large  k, 

K. 

ETk  <  T  r  [1  -  exp  -  (S  (K  )  4  d/7)/ek]n 
K  n=0  b  u  K 

N.  , 
k+1 

+  T  Bk  l  [1  -  exp  -  (Sg(K0)  +  d/2)/ek+1]n 
n=0 

N,  . 

k+2 

+  T  8k8k4l  I  [1  -  exp  -  (SG(K  )  ♦  d/2)/ek+2]n  ♦  ••• 
n=0 

In  order  to  estimate  the  terms  in  the  sum  beyond  the  first,  note  that 
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«„  **p[W  *  d'2)'E..i 

<  exp-[Nmexp-(SG(K0)  •  d/2)/En)«xp[SG(K0)  *  d/2)/Ep>1, 

which  (since  N  >  exp  c,  am  for  some  c,  >  0  if  m  and  An  are 
m  —  *  l  l  0 

large)  is  a  term  of  a  summable  series.  Thus  for  large  k  and  AQ, 

00 

ExTk  i  T  I  I1  -  exp- (sgCKq)  +  d/2)/ek]n  +  constant 
(3.15)  0 

<_  exp(SG(KQ)  4  d)/£k. 

Part  4.  For  the  rest  of  the  proof  we  let  {£^}  be  mutually  inde¬ 
pendent.  This  is  for  notational  convenience  only.  I  allows  us  to 
avoid  the  notation  associated  with  the  conditioning  used  in  (e.g.)  (3.12) 
and  (3.13).  In  general,  we  work  as  in  the  last  part  by  using  appropriate 

conditioning  and  taking  sup  or  inf,  as  appropriate.  In  fact,  the 

0)  0) 

proof  of  Theorem  2  in  Section  4  uses  the  'full  conditioning'  argument. 

We  will  need  the  following  lemma,  whose  proof  is  only  a  slight  modifica¬ 
tion  of  that  of  Lemma  2.2  of  [7,  Chapter  4]  or  Lemma  1.9  of  [7,  Chapter  6] 
and  is  omitted.  The  second  part  of  the  lemma  will  be  used  in  Theorem  2 
below  (it  does  not  assume  mutual  independence  of  {£^}) . 

Ler  .a  2.  For  each  small  a  >  0,  there  are  c  >  0,  TQ  <  «>,  >  0, 

such  that  for  e  <  and  all  y  £  G  -  N  (Kg)  and  all  T^, 

>  V  1.  exp  -  c(T4-T0)/e 

where 

=  inf{t :  xe(t)  £  G  -  Nq(K0)}. 
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More  generally,  let  x  denote  a  stopping  time  and  let  K  be  compact  but 
not  contain  an  entire  limit  set  for  (3.1).  Define  x^  =  inf{t: 
xe(x+t)  ?  K}.  Then 


y*Be(x) 


p  «  >  T4)  <  exp  -  c(T4-Tn)/E> 


4 


for  all  finite  stopping  times  x  and  y  €  K  and  e  <_  e^.  (P^,  ^  ^ 
was  defined  above  (2.6').) 

We  now  proceed  very  similarly  to  [7,  p.  125-6].  Since 


I  [1  -  exp  -  (SG(K  )  ♦  d/2)/ek]n  +  0, 
n=Mk 


and  the  contribution  to  the  mean  value  of  x^  from  paths  which  do  not 
exit  G  before  is  used  is  vanishingly  small  as  k  -*•  °°,  in 

calculating  a  lower  bound  on  the  escape  time  we  can  and  will  assume  for 
(3.7)  that  for  all  n  (or,  equivalently,  work  with  x  (•), 

for  e  =  ek).  Define  Fj  =  N  (KQ)  -  Ny^(K0),  F2  =  N  (Kq)  U  (Rr-G) 
and  define  the  stopping  times  {c^,p^}  by  pQ  =  0 

oi  =  inf(t  >  p.  :  x*(t)  £  F^, 

Pi  =  inf {t  >  oi_1  :  x*(t)  £  F2>. 


The  only  way  for  x  (•)  to  jump  from  the  exterior  of  N  (Kn)  into 

u3  0 

N  (Kn)  is  if  it  is  pushed  there  by  a  very  large  value  of  ^  .  But 
1^2  u  n 

this  is  ruled  out  by  the  comments  made  in  the  beginning  of  Section  2. 
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For  x  £  (Kq)  and  any  <  °°, 


(3.16) 


P  (^(PJ  £  (Rr-G)>  1  max  P  {t.  =  p.  <  T.) 
x  y£F1  y  K  l  4 


+  Px{Tk  =  P1  >  T4} 


By  Lemma  2,  for  each  M  <  <*  there  is  a  <  «  such  that  the  far  right 


hand 


term  of  (3.16)  is  less  than  exp-M/e^  for  large  k. 


Recall  that  we  chose  the  y^  such  that  S(x',y')  <_  d/4  for 


x',y'  £  (Kq) .  By  the  compactness  (for  each  s,y)  and  upper  semicon¬ 


tinuity  (in  s,y)  of  the  sets  4>^(t),  there  is  a  6^  >  0  (not  depending 


on  y)  such  that  the  paths  which  start  at  y  £  (K^)  and  exit  G 


before  T.  are  at  a  distance  of  at  least  6,  from  the  set 
4  1 


^  ^  d// 2 CT4 ) .  (The  minimum  value  of  Sy(T4,<}>)  for  such  an  exiting 
G  0 


path  must  be  at  least  ^(K^)  -  d/4.)  Then  it  follows  from  (2.7)  that 
for  y  £  F1 


(3.17) 


PvW  <  T4)  iexp-[SG«0)-d]/Ek! 


for  large  k.  In  fact  by  the  just  cited  uppersemicontinuity  and  compact¬ 
ness,  we  can  write  max  P  {t,  <  T  }  in  (3.17).  Then,  for  a  large  fixed 
y£F1  y 

M  and  all  large  k, 

Px(^(Pl)  £  G}  <  exp-(SG(K0)-2d)/ek,  x  £  N  (KQ) . 

Define  v  =  min{n:  x*(p  )  £  G}.  Then  for  x  £  N  (Kn))  , 
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P  {v>n}  =  P  {x  (p.)  e  N  (K  )  ,  all  i  <  n) 
x  A  1  pj  u  — 

>  inf  P  {^(p  )  €  N  (Kn)}Pv{v>n-l) 

y€N  (k0)  y  1  yi  0  x 

>  (1  -  exp  -  (Sg(Kq)  -  2d)/ek)n. 


For  each  (p^,  there  is  a  tj  >  0  such  that  inf  E  (p^ 


Thus 


et,  =  y  e  I,  .  jp  -  p  .) 
x  k  £  x  {v>n}  n  n-1 

oo 

—  \  ExI(v>n}^Pn  "  °n-P 


>  l  Px{v>n}inf  E  (p1  -  oQ) 


V  1  S 


>  (constant)  exp(SG(K0)  -  2d)/ek> 


This,  (3.15),  and  the  arbitrariness  of  d  yield  (3.8b).  Q.E.D. 

p 

Remark.  The  proof  with  use  of  coefficients  a  =  a  ,  follows 
-  n  n+k 

readily  from  the  above  proof  and  the  fact  that  we  can  choose  a  >  1 
arbitrarily  close  to  1.  For  all  practical  purposes,  the  'piecewise 

p 

constant'  a  can  be  used  in  lieu  of  the  a  ,  . 

n  n+k 
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4.  Asymptotic  (Large  Time)  Properties  of  {X^} 

In  this  section,  we  obtain  results  analogous  to  those  in  [7,  Chap¬ 
ter  6]  for  the  {Xn}  and  { xn ( * 0 }  of  (3.2).  Again,  we  use  the  'intermediate' 
processes  (x  (•)}  with  piecewise  constant  coefficients  to  obtain  the 

results.  Let  I  =  {l,...,m}  and  let  K, denote  a  collection  of 

l  m 

disjoint  compact  sets,  each  of  which  is  an  invariant  set  for  (3.1),  and 
such  that  U  1C  contains  all  the  limit  sets  for  (3.1).  If  S(x,y)  =  0 
for  all  x,y  in  any  set  K,  let  that  K  be  one  of  the  1C.  The  collec¬ 
tion  (1C  }  contains  all  the  stable  (and  unstable)  sets  for  the  al¬ 
gorithms  (1.1),  (2.5)  and  (3.2),  and  it  is  of  interest  to  study  the 
asymptotic  statistics  of  the  movement  from  a  neighborhood  of  one  of  the 
1C  to  a  neighborhood  of  another.  This  is  particularly  important  for  an 
understanding  of  the  use  of  (1.1)  for  global  minimization  (or  'near' 
minimization)  by  Monte  Carlo. 

Ke  make  some  additional  assumptions. 

A4 , 1 .  The  controllability  assumption  (A3.1)  holds  for  each  1C, 
i  =  l,...,m  replacing  KQ  there . 

Define  S. .  =  S(K.,K.)  =  inf  S(x,y).  By  (A4.1),  S. .  =  S(x,y) 

1  3  xOC.yeiC  1J 

for  any  x  6  1C  and  y  €  1C  and  S(x,y)  =  0  for  x,y  €  1C.  Also, 

by  an  argument  like  that  of  Lemma  1,  S(x,y)  is  continuous  in  x,y,  for 

x,y  £  U  K. . 

.1 

l 

It  is  useful  to  be  able  to  bound  the  paths  xc(*) ,  ~xn(*) ,  etc.  There 
are  several  ways  of  doing  this.  Perhaps  the  simplest  is  to  project  them 
back  onto  some  (large)  set  -  if  they  ever  leave  .  This  idea  involves 


a  number  of  new  considerations  and  details.  A  reasonable  alternative  is 


to  fix  the  dynamics  such  that  for  some  compact  set  (a  sphere,  for  example) 
Dj,  all  paths  remain  in  .  This  is  not  a  restriction  in  applications, 
since  in  the  simulations  we  can  always  add  a  penalty  function  and  choose 
o(x),  or  otherwise  fix  the  dynamics  for  large  |x|  to  guarantee  bounded 
paths.  For  simplicity  assume 


A4.2.  There  is  a  sphere  D,  such  that  D,  contains  U  K.  in  its 
interior  and  o(x)  ■>  0  as_  x  -*■  3D^  and  the  trajectories  of  xe(*)> 
xn(.)  stay  in  D ^ .  All  paths  of  (3.1)  starting  in  stay  in  Dj. 

By  (A4.2),  we  can  assume  that  for  small  6  >  0,  any  6-optimal  path  connecting 
a  small  neighborhood  of  K.  with  a  small  neighborhood  of  does  not  leave 

.  I.e.,  we  can  assume  that  for  small  6  >  0,  if  <*>(•)  is  such  that 
e(0)  =  x,  i(T)  =  y,  x  C  small  neighborhood  of  L,  y  6  small  neighborhood 
of  K.  and  S  (T,4>)  ±  S.  .  +  6,  then  <t(t)  6  D  ,  t  <_  T. 

Let  y^  be  defined  as  in  Theorem  1  but  add  (fixed  henceforth)  a 

U4  >  Uj.  with  the  (N  (K.)}  disjoint ;  define  g.  =  N  (K.)  and 

1  1  ^1  1 

-I\  =  ^y  “  ^y  .  The  natural  analog  of  the  scheme  in  [7,  Chapter  6] 

for  getting  the  asymptotics  of  {xn(-)}  or  (x  (•))  involves  estimating 
the  probability  of  the  process  going  from  g^  to  I\  and  then  to  g y 
j  /  i,  and  then  calculating  the  mean  times  via  the  particular  formulas 
developed  in  [7]  involving  products  of  the  probabilities  of  various  chains 
connecting  the  -fr^ , ) .  With  a  few  modifications,  the  results  carry  over 
to  our  case.  We  first  reproduce  some  of  the  notation  in  [7],  adapted  to 
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our  case.  The  proofs  here  will  be  simpler  than  these  in  [7],  since  the 

set  D  in  [7,  Chapter  6]  is  replaced  here  by  a  set  of  the  form 

R  -  U  g.,  for  some  subset  Jc  I,  and  the  N  (K.)  are  'small'  neigh- 
j£J  3  yi  3 

borhoods . 

Let  J  denote  a  subset  of  I  with  l  members  where  i  <  m.  Define 

g  by  g  =  U  g. .  By  slightly  altering  the  N  (K.)  we  can  assume 
J  J  i€J  1  yi  3 

that  the  boundaries  are  as  smooth  as  desired.  A  J- graph  is  defined  to 
be  a  set  of  m-i  arrows  {y-*-<5}  connecting  points  in  I,  where  y  £  I-J, 

6  €  I  and  there  are  no  cycles,  and  each  point  in  I-J  has  one  and  only 
one  arrow  leaving  it.  G(J)  denotes  the  collection  of  J-graphs.  By 
the  symbol  g  6  G(i  — J) ,  we  mean  a  collection  of  m-l-1  arrows 
{y-*-6),  without  cycles,  where  i  £  I-J,  y  £  I-J,  6  £  I  and  not  containing 
chains  leading  from  i  to  J. 

We  also  use  the  following  definitions.  Note  that  our  S  in  the  V 
of  [7],  Again,  the  notation  is  adapted  from  [7,  Chapter  6],  Define 


(4.1) 


S  =  inf{S(T,4>)  :  <K0)  £  K  <{>(T)  £  K  ,  4>(t)  t  U  K  t  <  T} 

13  1  3  s  /  i ,  j  s 

=  00  if  the  above  set  is  empty 


W,  =  min  £  S  , 
g€G(J)  (y-*-5)£g  Y° 


MT (K. )  =  min  J  S  , . 

J  1  g£G(i— /-»  J)  (Y-6)£g  y6 


Let  ij,  j,  and  Tj  denote  the  first  entrance  times  into  the 
n  ^  £ 

set  gT  for  the  x  (•)>  x  (•)  and  x  (•)  processes,  respectively. 


r>V-  V.-V 
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Theorem  2.  Under  our  conditions,  for  large  Aq, 


(4.2) 


lim  a  log  E  T5?  =  lim 
n  "  x  J  k 


EkEx  l0« 


t.  T  =  lim  e 
k,J  e 


log  ExTj 


(Kj). 


uniformly  for  x  in  any  small  enough  neighborhood  of  any  . 

Remark .  If  x  is  not  very  close  to  some  K^,  then  the  path  tends 
to  a  small  neighborhood  of  some  'very  fast'.  This  fact  and  the 

theorem  are  enough  to  give  us  the  relative  asymptotic  times  that  {X^} 
spends  in  any  set. 

Proof:  If  the  set  G  in  Theorem  1  is  replaced  by  Rr  -  g,  = 

U  g.  ,  then  the  'continuity'  condition  (3.3)  is  not  needed  owing  to 
J  1 

(A4.2),  which  allows  trajectories  hitting  8G  =  8g^  to  be  extended  into 
the  interior  of  gj  at  'small'  extra  cost,  if  y ^  is  small.  The 
above  G  also  corresponds  to  the  set  D  in  [7,  Chapter  6] . 

We  will  prove  only  the  second  two  equalities  of  (4.2)  for  arbitrary 
a  >  1  and  under  the  condition  that  all  S„  =  S(K^,K^)  <  «.  The  equi¬ 
valence  of  the  first  two  terms  follows  from  the  calculations  below. 

'We  proceed  as  follows.  First  show  the  middle  equality  in  (4.2),  then 
work  with  xE(')>  The  proof  requires  Lemma  3  below  (our  analog  of  Lemma 
2.1  of  [7,  Chapter  6]).  With  this  lemma  the  proof  can  be  readily  com¬ 
pleted. 

The  proof  of  the  second  equality  in  (4.1)  is  similar  to  that  in 
Theorem  1.  Fix  d  >  0.  Choose  T^  such  that  the  paths  of  (3.1)  start- 


ing  anywhere  in  get  to  Ug^  by  time  and  then  stay  there  (for 

the  appropriate  small  vk) .  There  is  such  a  since  all  the  limit 

sets  of  (3.1)  are  strictly  inside  U  g^.  There  are  T2  <  »  and 
4>lj(-)  €  Cx[0,T2],  x  €  K.,  i  £  I-J,  j  £  I,  such  that  <}>lj(t)  6  N  ^(Kj) 
for  some  t  <_  T2  and 


Sx(T2.«>  £  S..  +  d/4, 

S(x,x')  <_  d/4  for  (x,x')  £  K  (K^)  and  each  i, 


We  can  also  suppose  (see  proof  of  Theorem  1)  that 


PB  h}  *£D1- 


Set  T  =  +  T2  as  in  Theorem  1,  and  define  =  max[S^£^  ]+d.  Using 

an  argument  analogous  to  that  in  Theorem  1  (part  3)  yields  that  the  con- 
tribution  of  the  time  that  equals  em  (for  m  >  k)  to  the  mean  hitting 

time  E  t,  T  is  bounded  above  by  the  expression 

X  K  jJ 


nm  (1  -  exp-Q^/e  )  l  JI  (1  -  exp-Q?+1/e  .) 
1  1  m  o  0  1 


where  the  satisfy  Q™  <_  S^/A^.  Since  exp  c^am  for  some  Cj  >  0 

and  large  m  (N^  is  defined  at  the  start  of  part  3  of  proof  of  Theorem  1) 
the  above  quantity  is  bounded  by 

exp [- (exp  c1aII1)exp(-S0a,I1/A0)]exp  S0am+1/A0 

which  (for  large  Aq)  goes  to  zero  faster  than  a  geometric  series  as 
m  -*■  <*>.  Also,  as  in  Theorem  1,  the  contribution  to  the  mean  hitting  time 
(E  T,  .)  of  the  part  of  the  path  beyond  the  first  N.  interpolated  T- 
intervals  is  asymptotically  (as  k  -*■  »)  negligible.  Thus  the  first  equality 
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of  (4.2)  holds  and  we  need  only  work  with  xe(*). 


Define  t  ,  a  by  ta  =  0  and 
n  n  1  0 


=  inf{t  >  xn  :  x  (•)  6  U  f.) 


t  =  mf{t  >  a  ,  :  x  •  6  U  g. 

n  n-1  J  .i 

i 


Let  =  x  (x  ) .  In  the  following  lemma,  the  'conditional  transition' 


probabilities  for  {Z^}  will  be  estimated. 


Lemma  3.  Fix  d  >  0.  There  is  an  >  0  such  that  for  e  <_ 


and  all  i,j  and  x  €  I\  and  n  >_  1 , 


(4.3)  exp- (S^  +  d/4)/e  <  Px>B  (T  ){Zn+1  6  g..}  <  exp-(S„  -  d/4)/c 


(P  D  ,  .  is  defined  above  (2.6').) 
x,B£(t)  - 


Proof:  Fix  i,j,  i  ^  j,  and  d  >  0.  There  are  small  (u^}»  y^  >  0, 


<5„  >  0  and  t,  <  “  such  that:  for  each  x  €  g. ,  there  is  a  path  *..(•) 
0  1  6i  r  n  1 


on  [0,t1]  connecting  x  to  then  to  and  after  leaving 


N  (K.)  the  distance  of  the  path  from  g.  and  from  g  (s  /  i)  is 

U2  i  i  6s  - ' 

>  6a,  and  for  which  S  (t,,4>*.)  <  S.  .  +  d/4.  There  is  an  e„  >  0  such 
—  0’  xv  1  Yij  —  ij  0 


that  for  e  <_  £p  and  6^  =  ^min(6Q,yp  and  xt(Tn)  =  x  €  I\  we  have 


Px.Bt(tn){Zn*l  6  gi!  iPx.Be(Tn)(0^Jti  I*  i  V 


(4.4) 


>  exp-  (S^  +  d /2)/e. 


To  get  the  reverse  inequality,  note  that  for  any  t,  <  00 


(4.5) 


rx,B£(Tn)-n*l  -y^  y»®e(0n)  n+1  ' 


<  SUp  P„  _  *  v ( T  ,  >  t0} 

—  __  r  y.B  (a  )  n+1  —  2 
y£f. ,u  1  '  nJ 


*  'y.Blaj’VliWl'V 

y€I\  ,oj  e  n' 


By  Lemma  2,  for  any  M  <  »  there  is  a  <  “  such  that  for  small  e 

the  first  term  on  the  right  is  <_  exp-M/e. 

If  x£(o  )  =  x  £  I\  and  x£(cj  +t')  €  g.  for  some  t'  <  t_  and  if 
v  n  l  n  j  —  2 

x£(on+t)  £  Ks,  s  i  i,j  for  t  <_  t',  then  for  small  {y^}  there  is  a 
6 2  >  0  such  that 

(4.6)  sup  |x£(a  +t)  -  <t>(t)  |  >_  S2. 

0<t<t2 


**S..-d/2(V 

1J 


But  for  small  e  and  (yj,  (2.7')  implies  that 

(4.7)  Px  B  ^ a  ^  (event  defined  by  (4.6)}  <_  exp  -  (S.^.-d)/e 


for  all  x  f  T.  and  w  (see  also  the  related  argument  below  (3.16)).  The 
far  right  hand  term  of  (4.5)  is  bounded  above  by  (4.7).  This  completes  the 
proof  of  the  lemma  since  M  is  arbitrary. 

We  now  return  to  the  proof  of  the  theorem.  Let  {Z^}  denote  the  {Z^} 
process  stopped  on  firts  reaching  g^ .  We  have 


(4.8) 


E  TE. 

X  J 


l  E  I 


,ez„,b  ri  i'Vi'V 
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For  any  d  >  0,  the  argument  of  Theorem  1  yields  that  (for  small  fixed 

{)jj})  there  is  a  t^  >  0  (depending  perhaps  on  {y^})  such  that  for 

small  e,  tQ  <_  Ex  g  ^  ) (Tn+i"Tn)  1  exp  d/e,  x  €  I\  (to  get  the  r.h.s. 

just  let  G  decrease  to  a  small  neighborhood  of  in  Theorem  1) .  Thus, 

it  is  enough  to  estimate  (4.8)  without  the  (t  . -t  )  component.  In 

n+i  n  r 

[7,  Theorem  5.3  of  Chapter  6],  estimates  which  are  the  equivalent  to  those 
of  Lemma  3  for  the  problem  in  [7]  (those  of  Lemma  2.1  of  Chapter  6  there) 
are  used  to  show  that 

00 

lim  e  log  I  El  =  K  -  M  (K .), 

€  0  *  {Z^gj}  J  J  1 

for  x  in  a  small  neighborhood  of  .  Q.E.D. 
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5.  Extensions  and  Comments 

Cycling  and  asymptotic  movement  among  the  sets  K^.  Let  J  =  {j},  and 

fix  i,  where  K.  and  K.  are  stable.  The  unstable  sets  K.  are 
i  j  l 

'transient',  in  the  sense  that  if  is  not  stable  then  there  is  a 

1  t  t  such  that  S.  .  =0.  Let  S. .  <  S..  for  all  k  ^  i,i.  Then  Theorem 

2  implies  that 


lim  a  E  log  t1}  *  S.  x  €  small  neighborhood  of  K., 
n  n  x  J  i]  l 


and  with  a  'very  high'  probability,  will  be  the  successor  state  to 
.  This  is  almost  obvious,  since  if  (e.g.)  the  optimal  graph  in  the  cal¬ 
culation  of  K.  involves  a  link  i  k  /  j ,  then  cutting  that  link  and 
replacing  it  by  the  i  -*•  j  link  further  reduces  the  value  of  The 

is  treated  similarly.  As  in  [7,  Chapter  6],  the  asymptotic  behavior 
can  be  described  via  'cycles'.  There  will  be  groups  of  the  such  that 

for  a  long  time  the  process  will  cycle  between  states  within  a  group,  then 
switch  to  another  group  and  cycle  between  its  states.  At  the  next  higher 
level,  there  will  be  a  cycling  between  these  groups.  The  groups  them¬ 
selves  can  be  formed  into  higher  order  groups,  and  cycling  between  these 
described,  etc.  The  notation  is  involved,  but  the  procedure  to  get  the 
mean  times  for  the  transitions  within  any  order  of  the  hierarchy  is 
quite  similar  to  that  in  [7,  Chapter  6,  Section  6],  and  is  based  only  on 
the  analog  of  Theorem  2  and  Lemma  3  for  the  problem  in  [7] .  The  proced¬ 
ure  yields  the  (asymptotic)  mean  time  spent  in  the  various  states. 


31 


Ito  equations.  Let 

(5.1)  dy  =  b(y)dt  +  a(t)a(y)dw,  a2(t)  =  A0/log(t+Aj) , 

where  w(*)  is  a  standard  Rr-valued  Wiener  process.  Define  yn(-) 
by 

(5.2^  dyn  =  byn  dt  +  a(n+t)a(yn)dw,  t  >  0. 


If  cr(y)a'(y)  is  positive  definite  in  the  interior  of  (see  (A4.2)), 
then  the  action  functional  for  (yn(*)}  is 

1  fT  ■  -  -1  •  - 

SX(T>* )  =  (4>(s)-b(<Ks))'  [o(<Ks))o*  (♦(*))]  1(*(s)-b(*(s)))ds 


for  $(•)  absolutely  continuous,  and  it  equals  infinity  otherwise.  In 
general 


where 


SX(T,<|>) 


rT  ■ 

L(<Ks)  ,4>(s))ds 

•'0 


L(B,x)  =  sup [o' (B-b(x))  -  ^ct'cr(x)o' (x)a] . 
a  1 

The  obvious  analogs  of  Theorems  1,  2  and  Lemma  3  hold  with  a^ 

2  2 
replaced  by  a  (n)  and  e  by  e  . 

Invariant  measures  for  the  y(-)  of  (1.2),  (5.1).  Let  c(x)c'(x)  be 
bounded  and  uniformly  positive  definite  in  the  interior  of  and  assume 

that  there  are  only  a  finite  number  of  and  let  all  trajectories  of 

(3.1)  starting  in  Dj  stav  in  Dj.  Let  yc(*)  denote  the  solution  to 

(1.2)  with 


a ( t)  replaced  by  e.  In  [7,  Chapter  6,  Theorem  4.1  to  4.3]  an  expression 

for  the  invariant  measure  v£  of  ye(*)  is  given  (for  small  e  >  0) . 

Let  v(t)  denote  the  measure  of  y(t).  Then  v(t)  -  v  ,  .  -*•  zero  measure 

a(t) 

weakly.  Thus,  for  large  t,  the  measure  of  y(t)  is  very  close  to  that  of 

the  stationary  measure  of  ye(0  for  £  =  a(t) .  We  will  not  go  through  the 

details,  but  they  follow  from  the  following  considerations.  Replace  a(n+t) 

by  a  piecewise  constant  approximation  as  in  Theorem  1;  i.e.,  use  y^(*),  where 
we  define  (for  any  a  >  1  and  some  T^  >  1)  for  each  k 
(5.3)  dyk  =  b(yk)dt  +  a(k,t)a(y^)dw, 

Tn+1  '  ^  '  Tf  •  "  i  »•  4  ‘  A0''»k-  T0  '  0 
a(k,t)  =  ek  on  [0,  Tk+1-Tk) 

=  ek+1  on  [Tk+1-Tk,  *  etc‘ 

The  measure  v£  in  [7]  is  obtained  from  the  invariant  measure  of 
the  {Z^}  process,  where  we  define  {Z^}  here  as  in  Theorem  2,  but 
using  the  y  ( •)  of  (5.3)  instead  of  the  { ( * ) } •  In  fact  if  v^ 
denotes  the  invariant  measure  of  the  {Z^}  for  parameter  e,  and 
g  *  u8j»  then  [7,  (4.1)  of  Chapter  6] 

v_(B)  =  v^(dy)E  f1  I  (yE(t))dt. 

e  U  y  Jo  B 

There  is  an  <  00  such  that  for  Aq  >_  Aq,  the  number  of  transitions  of 
(Zn)  on  the  [Tk,Tk+j)  interval  increases  rapidly  enough  as  k  -*•  ® 
so  that  a  'near*  steady  state  is  reached  before  the  end  of  the  k**1- 
interval,  for  large  k.  To  see  this,  note  the  following:  (a)  all 
S. .  <  00  and  S. .  <  »;  (b)  for  any  d  >  0,  the  maximum  modulus  of  the 
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eigenvalues  (with  modulus  less  than  unity)  of  the  transition  probabili- 

a.i_  o 

ties  of  the  chain  {Z^}  on  the  1c  -interval  is*  <_  1  -  exp-fSg+dj/a^fk)  = 

1  -  exp  -  (Sg+d)ak/Ag  for  large  k;  (c)  the  length  of  the  interval 

k  k 

is  >_  exp  Cjd  for  some  c^  >  0.  Now,  let  =  exp  C2CX  for 

0  <  c^  <  .  Then 

P{^  (Tn+1-xn)  <  exp  clCtk} 

\  exp[S  +d]ak/A 

<  J i - 2 - - - 2.  <  exp  -  c  ak 

exp  c^ot 


for  some  >  0  for  large  enough  Aq.  Finally,  note  that 

k  \  k 

[1  -  exp  -  (Sg+d)a  /AQ)  <  exp  -  c4a  , 


for  some  c4  >  0  if  AQ  is  large  enough.  The  assertion  concerning 
convergence  to  the  invariant  measure  follows  from  this. 

The  potential  case.  Let  b(x)  =  -Bx(x)  and  use  the  process 
>■(•)  of  (1.2).  For  simplicity,  add  a  penalty  function  so  that  b(x) 
points  strictly  inward  on  3D^  for  some  sphere  D^.  Let  B(-)  be  con¬ 
tinuously  differentiable  and  assume  that  there  are  only  a  finite  number 
of  the  compact  K .  introduced  in  Section  4,  and  that  B(x)  -*  as  |xj  -*•  00 
Let  c(x)  =  I  except  close  to  9D^.  Since  o(x)o’(x)  =  identity  matrix 
'inside'  D^,  (A4.1)  holds. 

For  this  case  and  x,y  not  close  to  D^,  S(x,t)  has  a  simple  charac 
terization  as 

inffsums  of  (positive)  increases  in  B(4(*))  as  ( t T 
c  moves  from  x  to  y}, 

*  Recall  S_  e  max[S..,S..1  +  d 
i»3 


where  the  inf  is  over  all  differentiable  paths  connecting  x  and  y. 


There  is  a  similar  definition  for  S. .  and  S. .. 

13  13 

The  same  comments  apply  to  the  system  (1.1)  if  b(x,£)  =  b(x).  In 

these  cases  the  invariant  measure  v£  is  concentrated  on  an  arbitrarily 

small  neighborhood  of  the  set  of  global  minima  of  B(*)  for  small  e  [7] 

Let  vn  denote  the  measure  of  X  .  Then  we  have  that  v(t)  and  v11 

n 

are  both  ultimately  concentrated  near  the  set  of  global  minima  of  B(*) 
also.  This  includes  the  'annealing'  result  of  [8]. 

★ 

Global  function;  minimization  via  Monte  Carlo.  In  many  applications, 

one  can  choose  the  noise  E  in  b(X  .  £  ),  and  often  there  are  choices 
-  n  n  n 

which  greatly  enhance  the  search.  Let  b(x)  =  -B  (x) ,  where  B(‘)  and 

X 

0 ( • )  satisfy  the  conditions  in  the  above  'potential  case'  subsection. 

Then  both  (A4.1)  and  (A4.2)  hold.  For  each  m,  choose  £?,  i  <_  m,  such 
*  *  _ 

that  b (x ,  E)  =  -B^Cx.E),  B(x)  =  EB(x,£)  and  also  such  that 


i  m 

“  l  b(x,^)  =  b^x)  -  b(x) 

uniformly  for  x  in  any  compact  set.  Define  for 

k  =  0, 1 , . .  .  .  We  use 


(5.4) 


n+1 


X 

n 


a  b (X  ,E  ) 
n  v  n,s,n 


+  a  ill 
n  n 


n  “ic  € 

and  the  x  (•)»  x  ('),  x  (')  obtained  from  it,  as  in  the  previous  section: 

With  this  scheme,  the  measure  of  X  will  ultimately  be  concentrated  near 

1  m  " 

the  set  of  global  minimia  of  —  T  B(-,E.). 

m  |  1 

Let  S  (T,$)  be  the  action  functional  which  corresponds  to  xn(-) 
for  given  m.  Then 

*  In  Monte  Carlo  optimisation  by  simulation. 

**Ke  observethe  noise  corrupted  function  B(x,t)  and  its  gradient,  where 
EB(x,n  =  B ( x ) . 
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where 


S“(T,«  =  Ln($(s),*(s))ds, 

x  Jo 


Lm(8,x)  =  supIa’Ce-b^Cx))  +  a'a/2]. 
a 

Define  Sk\,  in  the  analogous  way,  and  let  the  superscript  'o'  denote 

the  case  where  b™(x)  is  replaced  by  b(x).  Theorem  2  and  Lemma  3  hold 
for  each  m.  As  m  -*■  °°, 


S.  . 

ij 


s* 


-  S°., 
il  il 


we  have 


(5.5) 


lim  lim  a  E  =  WT  -  M°(K.), 
n  x  J  J  J  l 

m  n 


where  the  limit  is  uniform  for  x  in  a  small  neighborhood  of  K„  Thus, 
for  large  enough  m,  as  t  -*■  <»  the  path  (xn(-))  will  spend  almost 
all  of  its  time  in  a  small  neighborhood  of  the  set  of  global  minima  of 

B(-). 

Numerous  variations  are  possible.  The  E*1  can  be  chosen  randomly, 

but  according  to  some  good  'variance  reduction'  method  with  the 

possibly  dependent  only  within  a  'cycle'.  We  could  let  the  cycle  length 

J\ 

be  m  -*■  °°,  and  use  {£.  }  in  the  k-th  cycle,  etc. 
n  i  ' 
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